The prognostic value and immunological role of CD44 in pan-cancer study

To investigate the correlation between cluster of differentiation-44 (CD44) expression and immunotherapy response and identify its possible predictive value in pan-cancer. Datasets of 33 cancer types from The Cancer Genome Atlas (TCGA) database were applied to investigate the relationship of CD44 expression with prognosis, tumor mutational burden (TMB), and microsatellite instability (MSI), and determine its potential prognostic value in pan-cancer. Patients were split into high-risk and low-risk cancer groups based on the survival outcomes of various cancer types. Additionally, the underlying mechanisms of CD44 in the tumor microenvironment (TME) were analyzed using ESTIMATE and CIBERSORT algorithms and Gene Set Enrichment Analysis (GSEA). Subsequently, the biological role of CD44 at single-cell level was investigated using CancerSEA database. Variable expression levels of CD44 between tumor and adjacent normal tissues were identified in pan-cancer datasets, further survival analysis revealed that CD44 expression was associated with multiple clinical annotations and survival indicators. Besides, the expression of CD44 was significantly associated with TMB and MSI in 10 types and 6 types of cancer, respectively, indicating it could be exploited as a potential biomarker predicting immunotherapy outcomes. Meanwhile, CD44 could influence several crucial immune cell-related pathways. and the results revealed by CancerSEA database denoted the correlation of CD44 with malignant phenotype and functional states, further indicating it can serve as a potential therapeutic target in cancer management. Our study demonstrated that CD44 shows great promise as a prognostic biomarker in numerous cancers, which will assist in developing new strategies in cancer management.

Association between the expression of CD44 and clinical annotations, TMB and MSI. Clinical annotations (age, gender, and tumor stage) of pan-cancer patients were downloaded from the TCGA database, and Spearman correlation analysis between CD44 expression and clinical annotations was performed by the R packages "limma" and "ggpubr". The correlation of CD44 expression with MSI or TMB was analyzed using Spearman analysis and visualized by radar plots through the R package "fmsb". Next, Tumor Immune Dysfunction and Exclusion (TIDE) database was used to assess the potential of CD44 as a responsive biomarker for the cancer cohorts treated with immunotherapy (http:// tide. dfci. harva rd. edu/), which is a web application integrating the expression profiles of T cell dysfunction and exclusion, thereby modeling immune evasion of tumor cells, and has the potential to predict the response of immune checkpoint blockade (ICB) 20 .

Relationship between CD44 expression, immune components, and tumor-infiltrating immune cell profiles.
To determine the proportion of immune and stromal components in the tumor microenvironment (TME), the ESTIMATE algorithm was applied to evaluate the associations between the immune and stromal scores with CD44 expression levels through R packages "estimate" and "limma". Next, relative tumorinfiltrating immune cells (TICs) levels were calculated using the CIBERSORT algorithm, and samples of the tumor with P < 0.001 were retained for subsequent evaluations. Correlation analysis between CD44 expression and relative TICs levels was conducted using the R packages "ggplot2", "ggpubr", and "ggExtra".
Single-cell analysis for CD44. Cancer single-cell state atlas (CancerSEA) (http:// biocc. hrbmu. edu. cn/ Cance rSEA/) is the dedicated database to explore the distinct functional states of different cancer cells at singlecell resolutions 21 . We used the CancerSEA database to evaluate the functional role of CD44.
Immune-related genes and enrichment analysis in various cancer types. The R package "limma" was employed to perform co-expression analysis between CD44 and immune-related genes, and the results were visualized by "reshape2" and "RColorBrewer" R packages. Then, GSEA analysis, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses (www. kegg. jp/ kegg/ kegg1. html) 22 , was carried out to explore the role of CD44 in pan-cancers, and the top five enrichment terms of each tumor type were illustrated using the R package "ClusterProfiler". Statistical analysis. Alterations of CD44 expression in tumor tissues and normal tissues were estimated using the Wilcoxon test. In survival analysis, the relationship between CD44 expression and survival information in pan-cancer patients was determined using Kaplan-Meier and univariate Cox regression analyses. The evaluation of these data was conducted through R software (Version 4.0.3) and Strawberry Perl (Version 5.30.0.1). P < 0.005 was considered statistically significant 23 . Ethical approval and consent to participate. Our study did not require ethical board approval because it did not contain human or animal trials.

Results
Pan-cancer expression profiles of CD44. To analyze the expression profiles of CD44 in the pan-cancer dataset, a comparative analysis of CD44 expression was performed between cancer and control samples using the TCGA database, the CD44 levels were significantly up-regulated in CHOL, COAD, ESCA, GBM, HNSC, KIRC, KIRP, READ, and THCA, whereas it was downregulated in LUAD, PRAD, and UCEC (Fig. 1). Collectively, these results revealed the difference in CD44 expression patterns between cancer and normal samples in the pan-cancer datasets.
Correlation analysis between CD44 expression and survival. Then, a correlation analysis of CD44 and the prognosis of pan-cancer patients were conducted. Survival indicators included OS, PFI, DFI, and DSS. In the OS analysis, the Kaplan-Meier survival curves indicated that the high expression of CD44 was remarkably associated with poor OS in LGG (P = 0.001), MESO (P = 0.002) ( Fig. 2A). In the PFI analysis, Kaplan-Meier analysis showed that patients with higher CD44 expression had a shorter PFI in LGG (P < 0.001) (Fig. 2B). Likewise, results of Kaplan-Meier analysis indicated that patients with higher CD44 expression had a poorer DFI in PAAD (P = 0.004) (Fig. 2C). Besides, Kaplan-Meier analysis indicated that the increased CD44 expression correlated with poorer DSS in patients with LGG (P < 0.001) (Fig. 2D).
Cox regression of OS identified that CD44 expression was a risk factor for KIRC (P < 0.001), LGG (P < 0.001), PAAD (P < 0.001), however, it appeared to be a protective factor in UVM (P = 0.002) (see Supplementary  (Table 2). Altogether, these results signal that CD44 may serve as a prognostic biomarker and potential therapeutic target.  www.nature.com/scientificreports/ Correlation analysis between CD44 expression and pan-cancer clinicopathologic characteristics. Thereafter, the association between CD44 expression and clinicopathologic characteristics was investigated in pan-cancer datasets. In patients less than (or equal to) 65 years, a higher CD44 expression level was noted in ESCA, and UCEC. In contrast, CD44 was higher expressed in patients over 65 years old in LUAD (see Supplementary Figure S2A  Correlation analysis of CD44 expression with TMB and MSI. TMB, MSI have been proposed to correlate with response to immunitherapy 24 , and we intend to evaluate the TMB and MSI status in CD44 expression to determine the potential of CD44 in reflecting the efficacy of immunotherapy to give suggestions on medication for cancer patients. High TMB was reported as a critical driver of cancer progression 25  Meanwhile, MSI also acted as a predictive biomarker, enabling more precise guidance of immunotherapy 26 . Hence, the relationship of CD44 expression with MSI was analyzed. Our findings revealed a positive correlation between CD44 and MSI in COAD and UCEC. On the other hand, a negative correlation was discovered between CD44 and MSI in ESCA, HNSC, KIRC, and PRAD (see Supplementary Fig. S3B online). Next, we compared CD44's predictive ability for immunotherapy efficacy to other canonical biomarker signatures in the TIDE www.nature.com/scientificreports/ database, using treatment responses from various cancer cohorts treated with ICB. The results confirmed that CD44 had a medium predictive performance, with 10 of the 25 ICB-treated cohorts presenting an area under curve (AUC) greater than 0.5 (see Supplementary Figure S4 online).
To sum up, these observations indicated that TMB and MSI were correlated in multiple cancer types, and the results provided by the TIDE database persuasively confirm its robustness in efficacy predictions, undoubtfully manifesting that it could be used as a reliable biomarker for predicting responses to immunotherapy.
Associations between CD44 expression and therapeutic response of targeted therapy in various cancer. Targeted therapy and immunotherapy have become mainstream in cancer treatment. However, only some subsets of patients benefit from these therapies and more biomarkers needed to be explored. We investigated the utility of CD44 in evaluating therapeutic responses of targeted therapy to various cancer (see Supplementary Figure S5 online). In BRCA treated with anti-HER2 therapy, CD44 expression level was higher in non-responders, and with an AUC of 0.588. Likewise, CD44 expression was higher in non-responders in colorectal carcinoma treated with bevacizumab, with an AUC of 0.64. Furthermore, CD44 was associated with benefits of targeted therapeutic relapse-free survival (RFS) at 12 months in ovarian cancer, with an AUC of 0.733. Taken together, above results elucidated that CD44 could act as a therapeutic response biomarker in various cancer.

Correlation between CD44 expression and various components in the TME of pan-cancer. Components of the TME include tumor cells, stromal cells, and immune cells 27 , which can influence
tumor formation, maintenance, and multidrug resistance, whereas non-malignant cells promote tumorigenesis in all stages of cancer 28 . Subsequently, the pan-cancer types were divided into high-risk cancer groups (BLCA, KIRC, KIRP, and LGG) (Fig. 3A) and low-risk cancer groups (OV, PRAD, SARC, and UCEC) (Fig. 3B) according to the survival outcome of the pan-cancer patients obtained from Kaplan-Meier curve and univariate Cox analysis. Unexpectedly, the immune score and stromal score were positively correlated with CD44 expression in every cancer type in both groups. Therefore, we hypothesized that CD44 could affect the immune and stromal components of TME.
To further explore the association of CD44 expression and TIC subtypes, the CIBERSORT algorithm was utilized to calculate the relative levels of TIC subtypes in patients from both groups, with P < 0.001 as the cut-off value. It was observed that CD44 expression levels were positively linked with neutrophils in BLCA and negatively linked with naive B cells, plasma cells, and regulatory T cells (Tregs) (Fig. 4A-D). Moreover, CD44 expression was positively correlated with macrophages M0, activated memory CD4 T cells, and Tregs in KIRC, and negatively correlated with resting mast cells, monocytes, resting NK cells, and resting memory CD4 T cells (Fig. 4E-K). In addition, CD44 expression levels were significantly positively correlated with naive B cells, neutrophils, and activated memory CD4 T cells in KIRP but negatively correlated with memory B cells and macrophages M2 ( Fig. 4L-P). Besides, a positive correlation between CD44 expression and resting memory CD4 T cells was noted in LGG (Fig. 4Q). In OV, CD44 expression levels were positively correlated with resting dendritic cells, neutrophils, and plasma cells and negatively correlated with activated dendritic cells (Fig. 4R-U). In UCEC, CD44 expression levels were positively correlated with neutrophils and activated memory CD4 T cells and negatively correlated with activated NK cells and memory B cells (Fig. 4V-Y). Collectively, these results infer that CD44 may mediate the immune response in these cancer types. Thereupon, correlation analysis was performed between CD44 and various immune-related genes (Fig. 5A), and immune checkpoint genes (Fig. 5B). Finally, the results suggested that CD44 may interfere with the TME by influencing the expression of immune-related genes and immune checkpoint genes, which mediates tumor progression and metastasis. Functional states of CD44. Next, we explored the functional role of CD44 in TME of various cancer types using CancerSEA database, which shows the correlation of CD44 with malignant phenotype and functional states at single-cell resolutions. The results showed that CD44 expression had a positive correlation with the angiogenesis, differentiation, EMT, inflammation and metastasis (Fig. 6A). Then, we evaluated the correlation with CD44 and the functional status in specific cancers. The results elucidated that CD44 positively correlated with metastasis, angiogenesis, EMT, and differentiation in LUAD (Fig. 6B); with metastasis and inflammation in GBM (Fig. 6C). Therefore, we tentatively proposed that CD44 may promote malignant phenotypes of cancer cells, which could thus be used as a potential therapeutic target for some specific cancer types.

Cancers enrichment analysis.
To elucidate the underlying molecular mechanism of CD44 in tumorigenesis, GSEA was performed to assess the biological significance of CD44 expression in eight pan-cancer types (Fig. 7). In GO functional annotation, CD44 was significantly correlated with several immune-related functions in KIRC, KIRP, and UCEC, such as leukocyte migration and detection of chemical stimulus (Fig. 7A). Furthermore, KEGG analyses demonstrated that CD44 could positively influence several crucial immune cell-related pathways in KIRP and LGG, such as the toll-like receptor signaling pathway and Leishmania infection (Fig. 7B). Overall, these results confirmed that CD44 is instrumental in TME remodeling for various cancers.

Discussion
The pan-cancer analysis can disclose the heterogeneities of tumors, providing insights into cancer treatment 29 . Numerous pan-cancer studies have focused on gene mutations and cancer development, which are helpful in the progression of sustainable, meaningful clinical treatments and the development of biomarkers 30 . As previously reported, CD44 is overexpressed in CSCs and plays a vital role in cancer progression, metastasis, and drug resistance 31 . It may also serve as a therapeutic target, given that it modulates multiple survival signaling www.nature.com/scientificreports/ pathways 10,32 . It has been previously reported that CD44 can promote CSC traits of metastatic breast cancers by activating the PDGFRβ/Stat3 signaling pathway 11 . Accumulating evidence revealed that CD44 might present as a therapeutic biomarker in various tumor types 12,33 . Although CD44 has been extensively studied in certain types of cancer, its role remains elusive in multiple cancers. In this research, we described the functional significance of CD44 and identified the differential expression of CD44 within cancers and normal tissues in 12 cancer types from the pan-cancer datasets. Moreover, we also confirmed that CD44 expression was relevant to the levels of immune cell infiltration in various types of cancer with ESTIMATE and CIBERSORT. Lastly, GSEA analysis exposed that CD44 was significantly correlated with several signaling pathways. CSCs are hypothesized to possess the ability of self-renewal, tumor initiation and metastasis. Prior research reported that the overexpression of CD44 in cancer cells is widely accepted as a marker of higher tumor-initiating potential and invasiveness of cancer cells 34,35 . CD44 is recognized as the CSC surface marker for sorting cancer types such as breast cancer 9 , prostate cancer 36 , and gastric cancers 16 . Previous studies revealed that CD44 might be unnaturally expressed in several cancer types and play an essential role in cancer progression. Herein, significant upregulation of CD44 expression levels was observed in cancer tissues compared to normal tissues in CHOL, Interestingly, some studies reported contrasting outcomes. For instance, a study reported that CD44 was up-regulated in LUAD, showed significantly higher capacities of tumorigenic colonies 37 , and was related to worse OS 38 . Notably, the functional role of CD44 on cancer development and progression has become a research hotspot and will assist in understanding its potential role as a prognostic biomarker for cancers. Furthermore, our results established that a higher CD44 expression level was related to unfavorable survival outcomes in LGG, MESO, and PAAD. Similar outcomes were also observed in glioma patients 39 . Compelling evidence obtained from 42 studies outlined that gastric cancer patients with CD44 overexpression had a lower 5-year OS rate 40 . Some studies also reported similar results in colorectal cancer 41,42 . Moreover, overexpression of CD44 predicted a poor prognosis in patients with hepatocellular carcinoma 43 and pancreatic carcinoma 44 . Besides, another study revealed that expression of CD44 varied significantly by age and gender in oral cancer 45 , which is consistent with the outcomes of this study, where CD44 was up-regulated in older patients with LUAD and downregulated in older patients affected by ESCA, and UCEC. Moreover, some studies revealed a novel potential therapeutic target that survival outcomes are also affected by stem cells, which can be regulated by www.nature.com/scientificreports/ stemness-related genes 46,47 . In short, these outcomes strongly indicate that CD44 might be a useful biomarker for most cancer types. Gene mutation is postulated to be the primary cause of cancer 48 , and specific gene mutations have distinct impacts on the prognosis and risk stratification of various cancer types 49 . TMB is defined as the number of somatic mutations per megabase of the interrogated genomic sequence, while MSI is defined as the collection of microsatellite mutations; both are widely used as predictive biomarkers of response to immunotherapy 50,51 .
Additionally, recent studies have demonstrated that MSI and TMB contribute significantly to the therapeutic response to immune checkpoint inhibitors (ICIs) 52 . The MSI-low phenotype was found as a worse prognostic biomarker in colorectal cancers 53 . However, MSI has limitations, such as immune checkpoint blockade failing to elicit a response in colorectal cancer cases 54 . This research established a relationship between CD44 and TMB and MSI, implying that CD44 may provide a more comprehensive perspective of immunotherapy in these cancer types.
The MSI status may alter the TME of cancer patients, thereby affecting the efficacy of ICIs 55 , while TME plays a crucial role in tumorigenesis and cancer progression 56,57 . Increasing evidence indicated that the immune escape of cancer cells is correlated with various components of the TME and ultimately contributes to tumor proliferation, metastasis, and recurrence. And the effect of risk scores on the TME may have essential roles in cancer development 58 . Albeit immunotherapy has made considerable advances in cancer treatment, it still faces numerous challenges in its successful application 59,60 . Indeed, to further improve the efficacy of immunotherapy, the identification of novel biomarkers is vital. Gomez et al. reported that CD44 expression was regulated by  www.nature.com/scientificreports/ TAM, which directly influences CD44 signaling via ligand binding in HNSC 12 . Nonetheless, little is known about the role of CD44 in the immune microenvironment. The results from this study indicated that CD44 level was significantly correlated with T cells, B cells, NK cells, macrophages, and other immune infiltrating cells in BLCA, KIRC, KIRP, LGG, OV, and UCEC. Taken together, it is reasonable to speculate that CD44 may play an essential role in cancer immunity and ultimately influence prognosis. This study still has some limitations, biological validation and large sample cancer cohort validation should be performed to better illustrate the role of CD44 in the pan-cancer study.

Conclusions
In summary, our results indicated that CD44 was associated with disease prognosis and immune infiltration in pan-cancers. Moreover, CD44 expression was also linked with TMB, MSI, and various components of the TME. These findings add to the understanding of tumor mechanisms and contribute to improving the efficacy of immunotherapy.

Data availability
The datasets supporting the conclusions of this article are available in the University of California Santa Cruz Xena (UCSC) Xena repository (http:// xena. ucsc. edu/).